{ "cells": [ { "cell_type": "markdown", "id": "4e2a6d7c", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "We will use the Transformers library from HuggingFace which is pip-installable:\n", "\n", "pip install transformers\n", "\n", "You'll also probably want to use PyTorch" ] }, { "cell_type": "markdown", "id": "a370a66a", "metadata": {}, "source": [ "## Exercise 1: Prompt Engineering 101\n", "\n", "The aim of this exercise is to understand how prompt structure affects LLM outputs.\n", "\n", "1. Use transformers.pipeline to interact with gpt2 or mistralai/Mistral-7B-Instruct-v0.1\n", "\n", "2. Try the three different format prompts:\n", " * Instructional: \"Write a poem about data science in astronomy\"\n", " * Conversational: \"What can you tell me about data science in astronomy\"\n", " * Completion-style: \"Data science in astronomy is the field of...\"\n", " \n", "3. Vary the temperature of the model and the top_k/top_p settings and see what effects this has on the outputs\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7de00d0b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "a2c455d9", "metadata": {}, "source": [ "## Exercise 2: Text Classification with AstroBERT\n", "\n", "The aim of this exercise to use a pre-trained LLM to classify astronomical texts.\n", "\n", "1. Create a data set of 20 random sentences from abstracts on arXiv (https://arxiv.org/list/astro-ph/new)\n", "\n", "2. Using the AstroBERT model (\"EleutherAI/astroBERT\") classify these sentences" ] }, { "cell_type": "code", "execution_count": null, "id": "6131a76a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "1a1bcd38", "metadata": {}, "source": [ "## Exercise 3: LLMs as Zero-shot Annotators\n", "\n", "The aim of this exercise is to repurpose LLMs to do zero-shot labeling.\n", "\n", "1. Using the data set of sentences you created in Ex. 2, use a zero-shot classification pipeline to create label topics." ] }, { "cell_type": "code", "execution_count": null, "id": "95200854", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "0340c335", "metadata": {}, "source": [ "## Exercise 4: Text Summarization and Evaluation\n", "\n", "The aim of this exercise is to assess how well LLMs summarize text.\n", "\n", "1. Take a few abstracts from arXiv and run them through a summarization pipeline using AstroBERT as the model.\n", "\n", "2. Compare the output with using model=\"sshleifer/distilbart-cnn-12-6\"." ] }, { "cell_type": "code", "execution_count": null, "id": "3d486fc7", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }